28 research outputs found
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, model-free methods are known to perform
poorly when the interaction time with the environment is limited, as is the
case for most real-world robotic tasks. In this paper, we study how maximum
entropy policies trained using soft Q-learning can be applied to real-world
robotic manipulation. The application of this method to real-world manipulation
is facilitated by two important features of soft Q-learning. First, soft
Q-learning can learn multimodal exploration strategies by learning policies
represented by expressive energy-based models. Second, we show that policies
learned with soft Q-learning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies. This compositionality provides an
especially valuable tool for real-world manipulation, where constructing new
policies by composing existing skills can provide a large gain in efficiency
over training from scratch. Our experimental evaluation demonstrates that soft
Q-learning is substantially more sample efficient than prior model-free deep
reinforcement learning methods, and that compositionality can be performed for
both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies
Quality Diversity for Multi-task Optimization
International audienceQuality Diversity (QD) algorithms are a recent family of optimization algorithms that search for a large set of diverse but high-performing solutions. In some specific situations, they can solve multiple tasks at once. For instance, they can find the joint positions required for a robotic arm to reach a set of points, which can also be solved by running a classic optimizer for each target point. However, they cannot solve multiple tasks when the fitness needs to be evaluated independently for each task (e.g., optimizing policies to grasp many different objects). In this paper, we propose an extension of the MAP-Elites algorithm, called Multi-task MAP-Elites, that solves multiple tasks when the fitness function depends on the task. We evaluate it on a simulated parameterized planar arm (10-dimensional search space; 5000 tasks) and on a simulated 6-legged robot with legs of different lengths (36-dimensional search space; 2000 tasks). The results show that in both cases our algorithm outperforms the optimization of each task separately with the CMA-ES algorithm
Real-time Flexibility Feedback for Closed-loop Aggregator and System Operator Coordination
Aggregators have emerged as crucial tools for the coordination of
distributed, controllable loads. However, to be used effectively, aggregators
must be able to communicate the available flexibility of the loads they control
to the system operator in a manner that is both (i) concise enough to be
scalable to aggregators governing hundreds or even thousands of loads and (ii)
informative enough to allow the system operator to send control signals to the
aggregator that lead to optimization of system-level objectives, such as cost
minimization, and do not violate private constraints of the loads, such as
satisfying specific load demands. In this paper, we present the design of a
real-time flexibility feedback signal based on maximization of entropy. The
design provides a concise and informative signal that can be used by the system
operator to perform online cost minimization and real-time capacity estimation,
while provably satisfying the private constraints of the loads. In addition to
deriving analytic properties of the design, we illustrate the effectiveness of
the design using a dataset from an adaptive electric vehicle charging network.Comment: The Eleventh ACM International Conference on Future Energy Systems
(e-Energy'20
Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games
While traditionally a labour intensive task, the testing of game content is
progressively becoming more automated. Among the many directions in which this
automation is taking shape, automatic play-testing is one of the most promising
thanks also to advancements of many supervised and reinforcement learning (RL)
algorithms. However these type of algorithms, while extremely powerful, often
suffer in production environments due to issues with reliability and
transparency in their training and usage.
In this research work we are investigating and evaluating strategies to apply
the popular RL method Proximal Policy Optimization (PPO) in a casual mobile
puzzle game with a specific focus on improving its reliability in training and
generalization during game playing.
We have implemented and tested a number of different strategies against a
real-world mobile puzzle game (Lily's Garden from Tactile Games). We isolated
the conditions that lead to a failure in either training or generalization
during testing and we identified a few strategies to ensure a more stable
behaviour of the algorithm in this game genre.Comment: 10 pages, 8 figures, to be published in 2020 Foundations of Digital
Games conferenc